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Performance  Appraisal  from  a  Process  Perspective: 

A  Final  Report 

Performance  appraisal  systems  play  a  central  role  in  the 
active  functioning  of  any  large  organization.  The  importance  of 
such  systems  has  become  more  evident  as  a  result  of  an  acute 
awareness  of  the  need  for  organizations  to  apply  personnel  practices 
with  regard  to  promotions,  raises,  job  assignments,  and  other 
actions.  The  need  for  effective  performance  appraisal  systems  also 
increases  as  organizations  become  large  and  more  complex.  The 
latter  affects  the  percentage  of  the  workforce  that  can  be  known  well 
by  any  particular  manager  and  the  percentage  of  the  total  set  of  tasks 
done  by  employees  that  a  manager  in  any  area  of  specialization  can 
understand  well  and  validly  evaluate. 

In  spite  of  relevance  and  increasing  demand  for  effective 
performance  appraisal  systems,  by  the  late  1970s  the  ability  to 
perfect  these  systems  seemed  to  have  reached  a  plateau — and  a 
relatively  low  one  at  that.  For  the  most  part,  work  on 
performance  appraisal  up  to  that  time  had  focused  upon  (1)  the 
design  of  performance  appraisal  instruments  or  scales,  and  (2)  the 
training  of  people  to  use  the  scales. 

In  a  watershed  review  of  the  performance  appraisal  research 
through  the  1970s,  Landy  and  Farr  (1980)  noted  the  limitations 
past  research  and  stressed  the  need  for  future  research  that 
shifted  the  concern  from  rating  scales  and  training  to  an 
investigation  of  the  cognitive  processes  involved  in  the  rating 
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task  itself.  Over  the  last  six  years,  a  great  deal  of  research  and 
theory  has  been  driven  by  the  orientation  suggested  by  Landy  and 
Farr. 

The  present  research  effort  was  framed  within  the  cognitive 
perspective.  The  research  was  guided  by  a  rather  detailed  model 
developed  by  Ilgen  and  Feldman  (1983).  The  general  framework  of 
the  model  suggested  that  the  rating  task  involved  four  primary 
subtasks.  These  were  to  (1)  gather  Information  about  the  ratee's 
performance  by  observing  that  person's  behavior  on  the  job,  (2) 
store  that  information  in  memory,  (3)  retrieve  information  from 
memory  when  asked  to  rate  performance,  and  (4)  make  an  evaluation 
of  performance  based  on  the  information  retrieved  from  memory. 

Most  of  the  research  supported  by  this  grant  addressed  one  or 
more  of  the  four  subtasks  described  above  in  an  attempt  to  better 
understand  the  way  in  which  raters  process  information  and  make 
performance  appraisal  ratings.  Although  a  number  of  research 
methods  were  used,  a  large  number  of  the  studies  involved 
developing  video  tapes  of  persons  working  on  a  job.  The 
development  of  such  films  was  extremely  time  consuming  but 
nevertheless  important  for  the  tapes  provided  a  constant  stimulus 
with  known  properties  which  could  be  presented  to  raters  allowing 
for  an  assessment  of  the  effects  of  the  known  information  on 
ratings.  In  some  cases,  the  video  tape  stimuli  were  used  in  work 
simulations  conducted  in  the  laboratory,  and,  in  other  cases,  the 
tapes  were  transported  to  field  settings  where  experienced  raters 
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were  used  in  the  research.  In  all  cases,  the  use  of  such  materials 
provided  a  valuable  method  for  assessing  the  accuracy  of  ratings. 

The  specific  studies  directed  at  one  or  more  of  the  four 
subtasks  of  the  Ilgen  and  Feldman  (1983)  model  will  be  mentioned  in 
the  paragraph  that  follows.  There  were,  however,  a  few  studies 
that  did  not  fit  neatly  into  the  subtasks.  The  first  of  these  was 
a  study  of  the  effects  of  allowing  people  to  choose  performance 
feedback  rather  than  have  it  given  to  them  automatically  (Ilgen  & 
Moore,  1983).  This  research  showed  that  giving  people  a  choice  of 
whether  or  not  to  receive  feedback  can  be  very  useful  when  the  act 
of  giving  feedback  is  time  consuming  and  performing  the  task  in  a 
timely  manner  is  Important.  Those  persons  with  higher  ability 
chose  feedback  less  frequently  and,  as  a  result,  were  able  to  do 
the  task  more  quickly. 

A  second  tangential  piece  by  Ilgen  and  Wiggins  (1985)* 
explored,  from  a  theoretical  standpoint,  the  effects  of  time  on 
goals  and  goal  setting  processes.  This  discussion  considered  the 
role  of  performance  feedback  and  changing  motivation  on  performance 
as  well  as  the  level  of  goals  maintained  by  persons  who  perform 
similar  tasks  for  a  relatively  long  period  of  time. 


Several  of  the  research  studies  were  first  published  as  technical 
reports  and  later  as  articles  or  book  chapters.  For  convenience, 
only  the  technical  reports  will  be  used  for  citation  in  this 
report . 
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The  remainder  of  the  published  research  on  this  project 
addressed  one  or  more  of  the  rater  appraisal  tasks.  Each  study  is 
briefly  mentioned  below.  In  addition,  all  published  materials  on 
the  grant  up  to  this  time  are  listed  in  an  appendix  to  this  report. 
Research  on  the  Appraisal  Process 

Information  Gathering.  Two  studies  dealt  directly  with 
information  gathering.  The  first  of  these  (Favaro  &  Ilgen,  1983), 
varied  the  type  of  information  available  about  ratees  and  observed 
the  amount  of  time  that  raters  spent  observing  ratee  performance. 
The  results  indicated  that  information  which  allowed  raters  to  form 
a  general  Impression  of  the  ratee  decreased  the  amount  of  time  that 
the  ratees  were  observed.  This  occurred  even  when  the  general 
Impression  was  one  that  was  not  perceived  as  providing  any  cues 
about  performance.  It  was  suggested  that  when  the  information  was 
performance  relevant,  the  effect  should  be  stronger  and  could 
potentially  impact  negatively  on  those  people  for  whom  negative 
stereotypes  about  their  performance  exist  in  the  rater  population. 

A  second  study  of  information  gathering  by  Youtz  and  Ilgen 
(1986)  provided  information  in  a  dynamic  mode  by  creating  different 
levels  of  performance  among  ratees  observed  over  time.  It  was 
expected  that  consistent  performers  would  lead  raters  to  feel  that 
they  knew  and  understood  how  well  these  individuals  were  performing 
thus  decreasing  the  time  that  the  raters  devoted  to  observing 
performance  at  a  later  time.  The  data  did  not  support  this 
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hypothesis.  The  lack  of  support  was  believed  to  be  due,  in  part, 
to  the  level  of  performance  in  addition  to  its  consistency. 

Storage.  A  study  by  Pulakos  (1984)  investigated  the 
interaction  between  rating  scale  format  and  the  tasks  of  gathering 
Information  and  storing  it  in  memory.  In  particular,  Pulakos 
argued  that  some  rating  scales  place  great  demands  on  information 
gathering  in  order  to  use  them  effectively.  Other  scales  affect 
encoding.  Pulakos  used  two  commonly  used  rating  formats  and 
provided  training  on  both  information  gathering  and  encoding/ 
memory.  The  results  showed  that  scales  do  demand  very  different 
processes  from  raters  and  that  ratings  are  more  accurate  when 
training  for  a  scale  focuses  on  the  information  processing  demands 
implicit  in  the  use  of  the  scale. 

In  two  studies  directly  addressing  information  processing, 
Ostroff  and  Ilgen  (1985a  &  1985b)  explored  the  nature  of  the 
cognitive  categories  used  to  store  information  about  employee 
performance.  Using  a  sample  of  nurses  and  a  video  tape  of  a  nurse 
performing  typical  nursing  tasks,  raters  provided  a  description  of 
the  dimensions  on  which  they,  themselves,  evaluated  nurses  and 
people  in  general.  Results  indicated  that  ratings  were  better  when 
the  personal  dimensional  system  of  the  raters  either  matched  or 
were  highly  consistent  with  the  dimensions  of  the  rating  scale. 
There  was  also  a  slight  indication  that  providing  people  with 
feedback  on  the  match  between  their  own  personal  system  and  that  of 
the  rating  system  may  have  been  helpful. 
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Research  on  recall  and  evaluation  focused  on  measures  of 
accuracy  (Youtz  &  Ilgen,  1986)  and  on  rating  errors  (Pulakos  & 
Schmitt,  1984;  Kozlowski,  Kirsch,  &  Chao,  1985).  The  first  of 
these  studies  provided  an  evaluation  of  Behavioral  and 
Classification  accuracy  measures  while  the  latter  looked  at  Halo 
errors . 

Conclusions 

The  research  supported  on  the  grant  provided  one  of  the  first 
sustained  research  efforts  to  investigate  performance  appraisal 
processes  as  they  relate  to  the  accuracy  of  ratings.  The  work  on 
the  information  gathering  stage  of  this  process  produced  perhaps 
the  clearest  findings  indicating  that  conditions  do  exist  which 
influence  the  amount  of  time  people  spend  observing  the  behaviors 
of  others  and  suggesting  ways  to  modify  conditions  or  train 
individuals  to  insure  more  adequate  sampling  of  behavior  prior  to 
rating. 

The  research  on  cognitive  category  systems  used  in  rating  was 
Interesting  from  the  standpoint  that  it  represented  one  of  the 
first  attempts  to  try  to  assess  the  nature  to  the  category  systems 
used  by  raters  in  field  settings. ^Prior  to  this  time,  inferences 
were  made  about  the  systems  in  terms  of  how  they  impacted  on 
performance  evaluations,  but  there  were  no  attempts  to  assess  these 
directly.  On  the  other  hand,  the  data  from  the  present  research 
were  sufficiently  unclear  as  to  leave  a  number  of  questions  with 
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respect  to  the  nature  of  the  category  systems  that  raters  possess 
and  the  effects  of  these  categories  on  ratings. 

Information  regarding  recall  was  gained  primarily  with  respect 
to  ways  to  assess  accuracy  directly  and  with  respect  to  rating 
errors.  The  accuracy  research  was  most  useful  with  respect  to 
indexing  behavioral  and  classification  accuracy.  The  rating  error 
research  focused  on  halo. 

Finally,  conducting  the  research  revealed  some  things  about 
the  nature  of  the  experimental  paradigms  used  by  us  and  by  most 
others  currently  addressing  performance  appraisal  processes.  Ilgen 
and  Favaro  (1985)  and  Ilgen  (1986)  discussed  some  of  the  boundary 
conditions  that  appear  to  be  necessary  for  research  that  is 
conducted  in  the  laboratory  for  the  purpose  of  learning  about  the 
process  of  performance  appraisals  done  in  the  field.  The  major 
point  of  this  research  was  that,  for  transfer,  some  minimum 
conditions  must  be  met,  and  many  of  the  social  psychological 
research  studies  from  which  constructs  are  borrowed  and  adapted  do 
not  meet  the  minimum  conditions. 

Ostroff  and  Ilgen  (1985a)  suggested  that  research  using  the 
typical  paradigm  for  assessing  performance  appraisal  accuracy  may 
severely  underestimate  the  size  of  the  effects  due  to  restrictions 
in  variance  on  the  criterion  measure — the  measure  of  accuracy. 
Typical  accuracy  measures  have  expert  judges  rating  video  tapes  in 
order  to  obtain  a  standard  of  performance  based  on  the  mean  rating 
of  the  judges.  If  the  experts  do  not  agree,  the  video  tapes  are 
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rerun  until  the  episodes  on  tape  produce  high  agreement  among  the 
judges.  It  was  argued  that  this  process,  necessary  for  confidence 
in  the  quality  of  the  standard,  is  also  likely  to  produce  episodes 
on  tape  that  are  quite  easily  judged  by  any  judge  including  a  naive 
one.  If  this  is  so,  there  is  likely  to  be  little  variance  in 
accuracy  measures  when  the  measures  are  based  on  some  level  of 
agreement  between  naive  subjects'  ratings  and  those  of  the  experts. 
This  problem  was  raised  by  the  authors  without  offering  a  good 
solution.  However,  it  is  suggested  that  future  research  needs  to 
look  closely  at  this  potential  problem  and  deal  with  it  if  the 
paradigm  is  to  be  useful. 
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